Assessing the Costs of Machine-Assisted Corpus Annotation through a User Study
نویسندگان
چکیده
Fixed, limited budgets often constrain the amount of expert annotation that can go into the construction of annotated corpora. Estimating the cost of annotation is the first step toward using annotation resources wisely. We present here a study of the cost of annotation. This study includes the participation of annotators at various skill levels and with varying backgrounds. Conducted over the web, the study consists of tests that simulate machine-assisted pre-annotation, requiring correction by the annotator rather than annotation from scratch. The study also includes tests representative of an annotation scenario involving Active Learning as it progresses from a naïve model to a knowledgeable model; in particular, annotators encounter pre-annotation of varying degrees of accuracy. The annotation interface lists tags considered likely by the annotation model in preference to other tags. We present the experimental parameters of the study and report both descriptive and inferential statistics on the results of the study. We conclude with a model for estimating the hourly cost of annotation for annotators of various skill levels. We also present models for two granularities of annotation: sentence at a time and word at a time.
منابع مشابه
Modeling the Annotation Process for Ancient Corpus Creation
In corpus creation human annotation is expensive. Annotation costs can be minimized through machine learning and active learning, however there are many complex interactions among the machine learner, the active learning technique, the annotation cost, human annotation accuracy, the annotator user interface, and several other elements of the process. For example, we show that changing the way i...
متن کاملFirst Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis
Manual annotation of large textual corpora can be cost-prohibitive, especially for rare and under-resourced languages. One potential solution is pre-annotation: asking human annotators to correct sentences that have already been annotated, usually by a machine. Another potential solution is correction propagation: using annotator corrections to dynamically improve to the remaining pre-annotatio...
متن کاملThe Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning
In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...
متن کاملEvaluating machine-assisted annotation in under-resourced settings
Machine assistance is vital to managing the cost of corpus annotation projects. Identifying effective forms of machine assistance through principled evaluation is particularly important and challenging in under-resourced domains and highly heterogeneous corpora, as the quality of machine assistance varies. We perform a fine-grained evaluation of two machine-assistance techniques in the context ...
متن کاملA highly accurate Named Entity corpus for Hungarian
A highly accurate Named Entity (NE) corpus for Hungarian that is publicly available for research purposes is introduced in the paper, along with its main properties. The results of experiments that apply various Machine Learning models and classifier combination schemes are also presented to serve as a benchmark for further research based on the corpus. The data is a segment of the Szeged Corpu...
متن کامل